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Memory corruption, 0-days, shellcodes... 




Why should anyone care about 
shellcodes in 2012? 



CONS: 



Old exploitation technique, too 
old for Web-2.0-and-Clouds- 
Everywhere-World (some would 
say...) 

According to Microsoft's 2011 
stats*, user unawareness is #1 
reason for malware propagaion, 
and 0-days are less than 1% 
Endpoint security products deal 
with known malware quite well, 
why should we care about 
unkown?.. 



PROS: 

Memory corruption is still there ;- 

Hey, Microsoft, we're all excited 
with MS12-020 

Tools like Metasploit are widely 
used by pentesters and blackhat 
community 

Targeted attacks of critical 
infrastructure - what about early 
detection? 

Endpoint security is mostly 
signature-based, and does not 
help with 0-days 
It's fun! ;) 



* http://download.microsoft.com/download/0/3/3/0331766E-3FC4-44E5-BlCA- 

2BDEB58211B8/Microsoft_SecurityJntelligence_Report_volume_ll_Zeroingjn_on_Malware_ 

Propagation_Methods_English.pdf! 



CTF Madness 



Teams write O-days from 
scratch 

Game traffic is full of exploits 
all the time 

Detection of shellcode allows 
to get hints about your vulns 
and ways of exploitation... 



^^^^ 




Another POV: Privacy and Trust in 
Digital Era 




We share almost all aspects of our lives 
with digital devices (laptops, cellphones 
and so on) and Internet: 

• Bank accounts 

• Health records 

• Personal information 



Recent privacy issues with social 
networks and cloud providers: 

• Linkedln passwords hashes leak 

• Foursquare vulns 

• What's next?.. 




Maybe the risk of 0-days will fade 

away? 
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Vulnerability Disclosur&s 



• Modern software market for mobile and social applications is 
too competitive for developers to invest in security 

• Programmers work under pressure of time limitation; managers 
who prefer quantity and no quality, etc. 

Despite the fact of significant efforts to improve code quality, the 
number of vulnerabily disclosures continues to grow every year... 



Shellcode detection: what do we have 



Static analysis 




Dynamic analysis 


A 





• Signature matching 

• Content-dependent 

• Behavioral 
(Hamsafl], Polygraph [2]) 

• CFG\IFG analysis (SigFree[3]) 

• NOP-sled detection 

(Race Walk [4] ) 

• APE/57 

The problem: if we simply try to run all methods for each portion of 
data, it would be extremely slow 

Classifierl Classified 

Classified 



Hybrid analysis 



Emulation ([61 [7]) 
Automata analysis 

(IGPSA[8]) 



([9], PonyUnpack[10]) 



Virtues and shortcomings 



Static methods 


Dynamic methods 


+ Complete code coverage (theoretically) 
+ In most cases work faster than dynamic 


+ More resistant to obfuscation techniques 


— The problem of metamorphic shellcode 
detection is undecidable 

— The problem of polymorphic shellcode 
detection is NP-complete 


— Require some overheads 

— Can consider only few control flow paths 

— There are still anti-dynamic analysis 
techniques 



And more: 

• Methods with low computation complexity have high FP rate 

• Methods with low FP have high computation complexity 

• They are also have problems with detection of new types of 0-day 
exploits 



None of them is applicable for real network 
channels in real-time 



STAND BACKi 




I'M GOING TP TRY 

SCIENCE 



Why shellcode detection is feasible at 

all 
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Activator 



Decryptor Payload 



RA 



As opposed to feature-rich 
viruses, shellcode has certain 
limitations in terms of size 
and structure 
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ENCRYPTED 
PAYLOAD 



ficomp ( 

rcrb % 



Proposed approach 



• Given the set of shellcode detection 
algorithms, why don't we try to construct 
optimal data flow graph, so that: 

- the execution time and FP rate are optimized, 

- and the FN rate is not more than some given 
threshold 



Shellcode features 





Generic features 


Specific features 




Correct disassembly into chain at least K 
instruction 


Correct disassembly from each and every offset. (NOP) 




Number of push-call patterns exceeds 
threshold 


Conditional jumps to the lower address offset. 
(Encrypted shellcode) 


u 


Overall shellcode size does not exceed 
threshold 


Ret address lies within certain range of values, (non- 
ASLR systems) 


Stat 


Operands of self-modifying and indirect jmp 
are initialized 


MEL exceeds threshold. (NOP) 




Cleared IFG contains chain with more than 
N instructions 


Presence of GetPC. (Encrypted shellcode) 






Last instruction in the chain ends with branch instruction 
with immediate or absolute addressing targeting lib call 
or valid interruption. (non-ASLR systems) 


u 


Number of near reads within payload 
exceed threshold R 


Control at least once transferred from executed payload 
to previously written address, (non-self-contained 
shellcode) 


Dynam 


Number of unique writes to different 
memory location exceeds threshold W 


Execution of wx-instruction exceeds threshold X (non- 
self-contained shellcode) 



Shellcode classes 



SH= {m 1 , ...,m ns }- set of shellcode features, BEN = {l lf -,l nb } -set of 
benign code features. 

Shellcode space 5is splitted for ^classes with respect to identified 
shellcode features. 




Shellcode classes. Example 
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• Multi-byte instructions 
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• Overall size does not exceed certain 




K sc 


threshold 




(self-ciphered) 






• Conditional jmps to the lower address 


Specific 




offsets 


features 



• Totally we identified 19 classes 



NOTE: none of existing shellcode detection metods provides complete 
coverage of identified classes 



Demorpheus: shellcode detection 
library and tool 




Common basic steps Elementary classifiers 



Hybrid shellcode detector 




Topology example 



Example of flow reducing 



Building hybrid classifier 



Round 1 Round 2 ... Round 




Evaluation: base line 



One of the important goals - minimization of false positives rate 
Hybrid topology Linear topology 




• Minimum false positives rate 

• No flow reducing 



Experimental results: numbers 



Data set 


Linear 


Hybrid 


FN, *100% 


FP, *100% 


Throughput, 
Mb\sec 


FN, *100% 


FP, *100% 


Throughput, 
Mb\sec 


Exploits 


0.2 


n/a 


0.069 


0.2 


n/a 


0.11 


Benign 
binaries 


n/a 


0.0064 


0.15 


n/a 


0.019 


2.36 


Random 
data 


n/a 





0.11 


n/a 





3.7 


Multimedia 


n/a 


0.005 


0.08 


n/a 


0.04 


3.62 



Experimental results: plots 




A couple of use-cases for hybrid 
classifier 

O-days exploits detection and filtering at network level 
CTF participation experience: 

- could help to increase defense level of team 

- could help to gather ideas from other teams 




Linked list of packets 



TCP reassembly 




Hybrid detector 



Demonstration here 



EXPERIMENT 
IN PROGRESS 



Conclusion 



• Good news everyone! 

• Shellcodes may be now detected up to 45 
times faster than before 

• You could download the Demorpheus source 
code from 

git@gitorious.org:demorpheus/demorpheus.git and 
integrate it with your own tool 
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